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Abstract — This paper focusses on the sparse estimation in the 
situation where both the the sensing matrix and the measurement 
vector are corrupted by additive Gaussian noises. The perfor- 
mance bound of sparse estimation is analyzed and discussed 
in depth. Two types of lower bounds, the constrained Cramer- 
Rao bound (CCRB) and the Hammersley-Chapman-Robbins 
bound (HCRB), are discussed. It is shown that the situation 
with sensing matrix perturbation is more complex than the 
one with only measurement noise. For the CCRB, its closed- 
form expression is deduced. It demonstrates a gap between 
the maximal and nonmaximal support cases. It is also revealed 
that a gap lies between the CCRB and the MSE of the oracle 
pseudoinverse estimator, but it approaches zero asymptotically 
when the problem dimensions tend to infinity. For a tighter 
bound, the HCRB, despite of the difficulty in obtaining a simple 
expression for general sensing matrix, a closed-form expression 
in the unit sensing matrix case is derived for a qualitative study 
of the performance bound. It is shown that the gap between 
the maximal and nonmaximal cases is eliminated for the HCRB. 
Numerical simulations are performed to verify the theoretical 
results in this paper. 

Index Terms — Sparsity, unbiased estimation, constrained 
Cramer-Rao bound, Hammersley-Chapman-Robbins bound, 
sensing matrix perturbation, asymptotic behavior. 



I. Introduction 

The problem of sparse recovery from linear measurement 
has been a hot topic these years and has drawn a great deal 
of attention. Various practical algorithms of sparse recovery 
have been proposed and theoretical results have been derived 
ifll- lfTOl . The theory of sparse recovery can be applied to 
various fields, especially the field of compressive sensing 
which considerably decreases the sampling rate of sparse 
signals HQ-El- 

Suppose that a sparse signal x £ M" is observed through 
noisy linear measurement 



Ax 



(1) 



where A £ jj" 1 *™ j s ca jj e( j the sensing matrix, y £ W n is the 
measurement vector, and n £ M. m is the additive random noise 
vector. The main issue of sparse recovery is to estimate x from 
measurement y with estimation error as small as possible, and 
the recovery algorithm should be computationally nactable. 

The performance of various recovery algorithms in noisy 
scenarios has been theoretical analyzed J6J-J81, lfT4l - lfT8l . 
Most of these works only consider the upper bound of the 
estimation error. Theoretical result about to what extent the 
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estimation error can be small (i.e. the theoretical lower bound 
of estimation error) is of great interest because it sets a 
limit performance which all sparse recovery algorithms cannot 
exceed. There are various approaches that try to handle this 
topic. Reference |fl9l employed a minimax approach to study 
the problem. Another approach is to reformulate the sparse 
recovery problem as a parameter estimation problem |20l . 
The sparse vector x is viewed as a deterministic parameter 
vector, and y represents the observation data. The goal of this 
approach is to minimize the mean-squared error E[(x — x) 2 ] 
(MSE) among all possible estimators x = x(y). The theory of 
lower bounds of MSE has been well established for parameter 
vector x £ R" without further constraints J2TJ. Various 
bounds, including the Cramer-Rao bound l2T1l . 1221 . the 
Hammersley-Chapman-Robbins bound ll23ll . and the Barankin 
bound l25l . have been introduced. However, the classical 
theory in general requires some modification to adapt to the 
sparse settings. 

Recently, researches on the lower bounds of MSE for con- 
strained parameter vectors, especially sparse parameter vec- 
tors, have been developed. The Cramer-Rao bound has been 
modified for the constrained parameter case, and works well 
in the sparse settings |20l , |26l . The Hammersley-Chapman- 
Robbins bound requires little essential modifications, and has 
also been applied to the the problem of sparse recovery |27l , 

EE 

This paper also focusses on the theoretical lower bounds 
of sparse estimators and employs the constrained Cramer- 
Rao bound and the Hammersley-Chapman-Robbins bound, but 
deals with a more general setting in which the sensing matrix 
is perturbed by additive random noise. Perturbed sensing 
manices appear in many practical scenarios, and therefore 
it is necessary to study the theoretical bounds of sparse 
recovery with perturbed sensing matrix 1291 - 13T1 . One of 
the consequences of perturbed sensing matrix is that it is 
a kind of multiplicative noise, and the total noise on the 
measurement vector is dependent on the parameter vector 
x, which demonstrates potential complexity compared to the 
sensing matrix perturbation-free setting (Q]i. 

The main contributions of this work are the theoretical 
bounds of sparse recovery with perturbed sensing matrix 
and noisy measurement vector. Closed-form expressions of 
the constrained CRB will be derived, and the quantitative 
behavior will be discussed. For the Hammersley-Chapman- 
Robbins bound, only the case of identity sensing matrix is 
studied for the sake of simplicity, but the results are still 
inspiring in that its analysis is much simpler and can still 
provide much information about the behavior of the theoretical 
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lower bounds when the noises are large. 

The rest of this paper is organized as follows. In Section 
HI] the fundamental problem of sparse recovery with perturbed 
sensing matrix is introduced, and the classical theory of param- 
eter estimation will be reviewed. In Section[III] the constrained 
Cramer-Rao bound will be derived, and quantitative analysis 
will be provided in order to have a deeper understanding 
of its behavior. In Section |IV] the Hammersley-Chapman- 
Robbins bound will be derived for the case with unit sensing 
matrix, and its behavior with different settings of signals and 
noises will also be studied. In Section [V] numerical results 
will be presented to verify the theoretical results. This paper 
is concluded in Section |Vl] and the proofs are postponed to 
Appendices. 

Notation 

The M X M identity matrix is denoted by XJm- For any 
index set A C {1, 2, . . . , N}, |A| denotes the cardinality, i.e. 
the number of elements of A, and A c denotes the comple- 
ment set {1,2,..., A}\A. For any index set A and any N- 
dimensional vector v (N > |A|), va denotes the |A| -length 
vector containing the entries of v indexed by A. For any index 
set A and any M x N matrix M (N > |A|), M A denotes the 
M x |A| matrix containing the columns of M corresponding 
to A. For any vector v, ||v||£ p denotes the p-norm of v. For 
any appropriate matrix M, denotes the Moore-Penrose 
pseudo-inverse of M. For x = {x\, ■ ■ ■ , x^) T , V x denotes 
the gradient operator (d/dxi, . . . , d/dxjy) T , and V x denotes 
its transposition, denotes the fcth column vector of the 
identity matrix. Af(fJ,, S) denotes (multidimensional) Gaussian 
distribution with mean fi and covariance S. Other notations 
will be introduced when needed. 

II. Problem Setting 

The fundamental background of sparse estimation with gen- 
eral perturbation, i.e. where both sensing matrix perturbation 
and measurement noise exist, is introduced in this section. In 
the case of general perturbation, the measurement vector is 
observed via a corrupted sensing matrix as 

y = (A + E)x + n, (2) 

where x is the deterministic parameter to be estimated, and 
y is the measurement vector. E £ l raX " represents the 
perturbation on the sensing matrix, whose elements are i.i.d. 
Gaussian distributed random variables with zero mean and 
variance a\. The vector n ~ J\f (0 , a^XJ m ) is the noise on 
the measurement vector y, and is independent of E. 

The parameter x is supposed to be sparse, i.e. the size of 
its support is far less than its dimension. The support of x is 
denoted by S, and its size is assumed not to be greater than 
s, i.e. 

\S\ = ||x||, < s. (3) 
Furthermore, it is adopted in the following text that 

spark(A) > 2s, (4) 

where spark(A) is defined as the smallest possible number k 
such that there exists a subgroup of k columns from A that 



are linearly dependent 0321 . The above prerequisite ensures 
that two different s-sparse signals will not share the same 
measurement vector if the measurement is precise. 

An estimator x = x(y) is a function of the measurement 
vector, and is essentially a random variable. It is demanded 
that a good estimator should approximate the parameter x as 
accurately as possible. A widely used criterion of the precision 
of an estimator is the mean square error (MSE), given by 

mse(x) =£ y;x [||x(y)-x||, 2 2 ]. (5) 

Here, E y - X [-] denotes the expectation taken with respect to the 
pdf p(y; x) of the measurement y parameterized by x. Note 
that the MSE is in general dependent on x. 

A. Review of Unbiasedness and the Barankin bound 

In order to obtain a good estimator, one usually resorts to 
the following optimization problem: 

argminmse(x) = argmin£; y;x [||x(y) - x||f ], 
*(■) *(■) 

and hopes that its solution could provide an estimator that 
achieves the minimum MSE globally. However, if no restric- 
tions are imposed on the estimators, the solution of the above 
problem is trivially x(y) = x, which achieves the minimum 
MSE (zero) only at one specific parameter value. This shows 
that the globally best estimator does not exist. Therefore one 
should take into account the proper restrictions imposed on 
the estimators. 

One widely used type of restrictions on estimators is unbi- 
asedness. Unbiased estimators are the ones that satisfy 

£y;x[x(y)] = X, Vx£*. (6) 

Here X denotes the set of all possible values of the parameter 
x. In the sparse setting, the notation X s is used for this set 
and could be formulated as 

X s = {x £ ffi™ : ||x||^ < s}. (7) 

The set of all unbiased estimators will be denoted by U. 

While unbiasedness excludes trivial estimators such as 
x(y) = Xi for some specific xi, it is still not guaranteed 
that there exists a uniformly minimum variance unbiased 
(UMVU) estimator, i.e. an unbiased estimator that achieves 
the minimum MSE globally among all the unbiased estimators. 
For the case where the sensing matrix perturbation vanishes, it 
has already been proved that the UMVU does not exist when 
s < n l20l . Despite the fact that the UMVU estimator may not 
exist, one can still solve the following minimization problem 

arg minmse(x) = arg min_E y;x [||x(y) — x\\"j } (8) 

for each x £ X s separately. The solution of the above 
minimization problem for a specific x is called a locally 
minimum variance unbiased (LMVU) estimator, and its MSE 
is known as the Barankin bound (BB) 125) , The BB can 
be viewed as the lower bound of the MSE of all unbiased 
estimators. Unfortunately, the BB often does not possess a 
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closed-form expression; even in the case where the closed- 
form expression exists, the computation is usually of great 
complexity. 

In the remainder of this paper, two types of lower bounds of 
the BB, the constrained Cramer-Rao bound (CCRB) and the 
Hammersley-Chapman-Robbins bound (HCRB), are discussed 
for sparse estimation with general perturbation. As they are 
lower bounds of the BB, they can also be viewed as the lower 
bounds of the MSE of unbiased estimators. Although they are 
not as tight as the BB, they usually possess simpler expressions 
and can provide insights into the properties of the BB. 

III. The Constrained CRB 

In this section, the constrained Cramer-Rao bound (CCRB) 
of the estimation problem (f2]) is considered. The CCRB 
generalizes the original CRB to the case where the parameter 
is constrained in an arbitrary given set. Researches on CCRB 
have been developed recently and especially on the situation 
of sparse estimation lEOl . 11261 . The CCRB can be summarized 
by the following proposition. 

Proposition 1: [26 1 Suppose that the parameter x G M. n 
lies in a given set AO, and xo is a specific value of x. Define 
the set J-"(xo) as follows, 

J-(xo) ={v G R" : 3 e (v) > 

s.t. Ve G (0, e (v)), x + ev G X). 

It can be proved that .F(xo) is a subspace of W 1 . Let V = 
[vi, . . . , v;] be an orthogonal basis of .F(xo), and J be the 
Fisher information matrix (FIM), 

J(x ) = E y]XB [(V»lnp(y;x))(V£]np(y;x))] . (9) 



If 



K(VV T ) C ft(VV T JVV T ), 



(10) 



where TZ(P) is the column space of P, then for any estimator 
x which is unbiased in the neighborhood of xo, its covariance 
matrix Cov(x) satisfies 



Cov(x) y V(V T JV) t V T , 



(11) 



where P >z Q means that P Q is positive semidefinite. The 
trace of the covariance matrix gives the MSE of the estimator. 
Conversely, if ( TTOb does not hold, then there exists no finite 
variance estimator which is unbiased in the neighborhood of 
x - 

Remark 1: Note that the estimators considered in Proposi- 
tion[T]are "unbiased in the neighborhood of Xo", which can be 
rigorously formulated as follows: define b(x) = E[St — x] to 
be the bias at x, then one says that the estimator x is unbiased 
in the neighborhood of xq if and only if 



Vv G J"(x ), 6(xq) = and 



<%(x) 



9v 



= 0. 



(12) 



'There are certain requirements that X has to meet. Refer to [20] for 
detailed exposition. Fortunately, the set X s of the sparse setting meets all 
the requirements. 



We denote the set containing all the estimators unbiased in 
the neighborhood of x as U xa . In the sparse setting, it can 
be seen that 

«cW X0 , Vx e^ (13) 

therefore the CCRB is certainly lower than the BB. Neverthe- 
less, the CCRB has simple closed-form expression which is 
convenient to analyze. In this section we relax our restrictions 
on the estimators to be unbiased in the neighborhood of a 
specific parameter value. 

From Proposition [U it can be seen that the computation of 
CCRB mainly relies on the computation of the FIM J and 
the orthogonal basis V. The FIM J is given by the following 
lemma. 

Lemma 1: The Fisher information matrix is given by 



J(x) 

u 

where az. is defined as 



1 



A T A + 2mai 



ol = ^llxlli 



(14) 



(15) 



Proof: The proof is postponed to Appendix [A] □ 
Next we deal with the orthogonal basis V of the subspace 
T . The cases in which ||x|j^ = s and ||x||^ Q < s should be 
discussed separately. For the case ||x||f = s, it can be seen 
that for every k G S = supp(x), one has ||x + ee^ ||^ < s, 
i.e. x + eefc G X s for arbitrary e, and therefore ej, G T\ on 
the other hand, for k ^ S, one has 1 1 x + ee^ 1 1 g > s and thus 
efe ^ J 7 . It follows that the subspace .F(x) can be formulated 
as 

J"(x) =span({e Sl ,...,e Ss }), (16) 



in which Si, . . . ,S S are the elements of the support S 
supp(x), and the basis V can take the following form 



V = [e Sl , ■ • ■ ,e s J 



(17) 



For the case ||x||f < s, the situation is rather different, 
because for every e^, k = 1, ...,n, one has ||x + eefe||^ < s. 
Thus it can be concluded that J-"(x) = W l in this case, and 
the basis V can be given by [ei, . . . , e„] = U n . 

With the form of the FIM J and the basis V, one can readily 
derive the CCRB of the problem ©. The situation in which 
x has maximal support (||x||^ = s) is first analyzed. 

Theorem 1: For ||x||^ = s, the CCRB is given by 



tr^As)- 1 )- 



l(A£ 



a2+2maX(A^A s )- 1 X5 



(18) 

where er^ i s defined in ( fl3b . 

Proof: The condition ( flOb should be checked first. The 
matrix V T JV is given by 



V T JV = V T 



1 



A A + 2ma, 



xx 



• V 



V 1 A 1 AV + 2mcr: 



V T xx T V 



A s A s 



2w« 



4 



Because we have assumed that spark(A) > 2s, it follows that 
As has full column rank, and thus V T JV is invertible by 
employing the Sherman-Morrison formula (33), which gives 



fV T JV) 



2ma e 4 (AgA s )- 1 x s x|(A|A s )- 1 



a2+2m ( i e 4 xT(ATAs)- 1 xs 

(1"9) 

and thus 7£(VV T JVV T ) = K(VV T ), i.e. the CCRB exists. 
The expression of the CCRB can also be obtained from ( [T9| > 
that 

mse(x) > tr (V(V T JV)" 1 V T ) 
= tr (W^JY)" 1 ) = tr ((VTJV)- 1 ) 



^((AlAs)- 1 ) 



2ma 4 H(AgA s )- 1 x s ||i 2 
al + 2m CTe 4 x|(ATA 5 )-ixs 



(20) 
□ 

Next consider the case in which x has nonmaximal support. 
The CCRB of this case can be summarized as the following 
theorem. 

Theorem 2: For ||x|j^ < s, if the FIM J is nonsingular, 
then the CCRB exists. Furthermore, if A has full column rank, 
then the CCRB is given by 



tr((A T A) 



2mai\\{K i 



1*2 



ol + 2?na 4 x T (A T A)- 1 x 



(21) 



If the FIM J is singular, then there do not exist finite variance 
estimators that are unbiased in the neighborhood of x. 

Proof: Because in this case V = U„, the two subspaces 
are respectively K(VV T ) = R" and "K(VV T JVV T ) = 
1Z(J). Therefore when J is invertible, it can be seen that the 
condition (ITOb holds, and the CCRB can be obtained by taking 
the trace of J . In the special case where A has full column 
rank, the inverse J -1 can be calculated with the help of the 
the Sherman-Morrison formula 



I" 1 ,-r 2 

J = <7^ 



T .. ! 2ma 4 (A T A)~ 1 xx T (A T A)-n 



(A'A) 



and the CCRB is 

mse(x) > tr (J -1 ) 

tr((A T A)- 4 )- 



a\ + 2m ( T 4 x T (A T A)- 1 x 



2mq 4 H(A + A)- 1 x||^ 2 
gI + 2mCTfx T (A T A)- 1 x 



(22) 



(23) 



When J is not invertible, the dimension of the column space 
of J is less than n. Thus the condition ( TTOb does not hold, and 
estimators that are unbiased in the neighborhood of x do not 
exist. □ 

Theorem |2] illustrates that for the nonmaximal support case, 
the prior information of sparsity cannot lower the theoretical 
bound of estimation error compared to the ordinary problem 
where x can be any vector in R™. This demonstrates a gap 
between the maximal and nonmaximal support cases of the 
CCRB, which is the main topic of the next subsection. 



A. Gap between the Maximal and the Nonmaximal Cases 

The gap between the maximal and nonmaximal cases of the 
CCRB can be revealed from the following example. Suppose 
we observe a sparse vector x £ X s with s nonzero entries. 
Then by the result of Theorem fU the CCRB is given by d 1 8 b - 
Next we assume that one of the nonzero components, say x q , 
tends to zero. Consequently, the CCRB given by ( TT~8T > tends to 
a specific limit 71. However, when x q equals zero, the CCRB 
cannot be computed by ( fT8l anymore because its support is 
now nonmaximal, therefore the CCRB of the x q = case 
is given by (fSTJ, which we temporarily denote as 72. It is 
interesting to find that 71 and 72 are not equal to each other, 
which means that the CCRB is not a continuous function of 
the parameter x. 

Generally one has 71 < 72, which can be inferred as 
follows. 71 could be seen as the CRB of estimators which are 
unbiased on the subspace span({ei : i £ supp(x)}), while 72 
could be seen as the CRB of estimators unbiased on R n . If the 
former class of estimators is denoted by U\, and the latter is 
denoted by U2, it can be seen that U\ D U2, and thus the lower 
bounds of estimation error of the two classes should satisfy 
7i < 72- This conclusion can also be verified by numerical 
approaches. 

This gap originates from the "discontinuity" of the re- 
striction that the estimator should be unbiased in the neigh- 
borhood of a specific parameter value. The "neighborhood" 
of a parameter point having maximal support in X s has an 
entirely different structure from that of a parameter point 
having nonmaximal support: the former is a subspace that 
is locally identical to R'\ while the latter is a union of s- 
dimensional subspaces. Fig. [TJ is a geometric illustration of 
the structure of the neighborhood of x. It can be seen that 
as x q — > 0, the structure of the neighborhood of x will have 
an abrupt change: from being locally identical to R s to being 
locally identical to R™. This is the cause of the gap between 
the maximal and nonmaximal cases. 

On the other hand, if a stronger condition, global unbiased- 
ness, is imposed on the considered estimators instead of unbi- 
asedness in the neighborhood, i.e. the estimators are restricted 
to be unbiased for all x £ X s , then this discontinuity should 
not occur. Thus the corresponding lower bound should also be 
continuous as x q — > 0, i.e. x changes from having maximal 
support to having nonmaximal support. Since 71 < 72, it is 
further demonstrated that the CCRB for maximal support is 
not sufficiently tight for estimators in U, especially when the 
support of x is nearly nonmaximal, i.e. at least one of the 
non-zero entries is small compared to other non-zero entries. 

B. Further Analysis of the CCRB 

In order to have a more profound understanding of the 
CCRB, we aim to compare it with a more intuitive quantity 
and analyze its asymptotic behavior. The following analysis 
will mainly focus on the maximal support case. As can be seen 
from ( TT8l and (l2Tb . the CCRB of the two cases share a similar 
form, and thus the main results provided in the following text 
will still be valid for the nonmaximal support case with minor 
modifications. 
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(a) The maximal support case 




(b) The nonmaximal support case 

Fig. 1. A geometrical demonstration of the discontinuity between the 
neighborhood structures of the maximal and the nonmaximal support cases. 
The vectors v; are base vectors of the neighborhood subspace T. 



The expression (IT8b contains two terms, the first of which 
is rather simple, while the second of which is much more 
complicated. Fortunately, the following proposition relates the 
first term o^tr ((AjAs) -1 ) to the MSE of the oracle pseu- 
doinverse estimator, which provides an intuitive explanation. 
The definition and performance of the oracle pseudoinverse 
estimator can be summarized as follows. 

Proposition 2: For a given support S whose size is s, define 
Xpi nv to be the following estimator 



(x pi nv(y))s 

(Xpinv(y)) 5 c 



A s y = (AgAg) _1 Agy, 



0. 



(24) 



This estimator is unbiased in the neighborhood of any param- 
eter value whose support is S. The MSE of x p i nv is 

o*ix ((AjAs)- 1 ) , Vx : supp(x) = S. (25) 



Proposition |2] demonstrates that the first term of the CCRB 
is just the MSE of the oracle pseudoinverse estimator. This 
term is also similar to the CCRB of the case where only 
measurement noise exists. Various references (e.g. iTZOl ) 
have shown that the CCRB when only measurement noise 
exists is the variance of the noise on y multiplied by the 
trace of (AgAg)^ 1 , and the oracle pseudoinverse estimator 
achieves this bound. 

The second term of ( TT~8b is more complicated. This term 
stems from the dependence of the variance of the total noise on 
the parameter x, and it reveals a possibility that this term might 
help estimate x more accurately. However, one still hopes that 
this term is not dominant in the CCRB, therefore we aim to 
bound this term by some simpler alternative, and especially to 
study how it varies versus the dimension of the signal. 

In order to bound the second term, some assumptions have 
to be made on the matrix A. 

Assumption 1: For the sensing matrix A in ( fT8] l. it is 
assumed that there exist constants $i )S 6 (0, 1) and ?9 U . S > 
such that 



(l-^, s )||x|| £ 2 2 < ||Ax||| 2 <(l + tf u , s )||x||j 



(26) 



for any s-sparse parameter x. Because any constants that are 
greater than §\ tS or ^ u s respectively can still satisfy the above 
inequalities, in the following i)\, s and $ u a are used to denote 
the smallest such constants. 

Remark 2: This assumption is very similar to the com- 
monly used restricted isometry property 11341 . Il35l . and pos- 
sesses the same form as the asymmetric restricted isometry 
property proposed in 1361 . However, the sensing matrix is not 
restricted to be underdetermined in the above assumption. 

Assumption Q] actually bounds the eigenvalues of Ag As 
that appears in the second term of the CCRB. Armed with this 
assumption, the following theorem provides lower and upper 
bounds on the second term of the CCRB. 

Theorem 3: Denote the opposite of the second term of the 
CCRB as drjCRB, i-e. 



iccrb 



2ma e 4 ||(A|A s )- 1 x s | 



c ol + 2mcT e 4 x T (A T As) -i Xs - 
Then dcCRB satisfies the following inequalities: 



(27) 



(1 + du,sY 
(l-tfl, S ) 2 



2c e 



(28) 



iCCRB 



> a: 



2(1 - i? M )c e + 1 + tf u , s + c„/c e 
(l-tfi, s ) 2 

2c e 



(l+tfu, S ) 5 



(29) 



2(1 + i? u , s )c e + 1 - tf ls + c n /c e 



where 



tr(A s 



T A S ) : 



Proof: The proof is postponed to Appendix [B] 



□ 
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indicate the matrix perturbation level and the measure- 
ment noise leve0 respectively. The ratio 7ccrb = 
rfccRB/(CCRB + (icCRB), i.e. the ratio of the second term 
to the first term of the CCRB, is bounded by the following 
inequalities: 



^ 1 (l + ^u, S ) 3 
7CCRB - ~S ' (l-tf,, s )2 



2c P 



(30) 



2(l-<di,s)c e + l + K,s + C n /c e 



7CCRB > - 



s 
X 



(1 



2c R 



(31) 



2(1 + )c + 1 - $i,s + c n /c e 



Proof: See Appendix O □ 
Theorem [3] provides a very simple approximate expression 
that captures how dccRB and 7ccrb vary with the noise level 
c e and c n : 

9 2cp 



iCCRB 



7CCRB 



1 



2c e + 1 + c„/c e 
2c e 



2c e 



1 + c n /c e ' 



(32) 
(33) 



provided that the constants $i )S and $ u . s are small compared to 
1. It is easily verified that the quantity 2c e / (2c e + 1 + c n /c e ) is 
always less than 1, and therefore the ratio yccrb can be upper- 
bounded approximately as l/s. Furthermore, (l33l l implies that 
as s increases, the second term c?ccrb becomes less and 
less important, and finally becomes negligible. This can be 
considered as the asymptotic behavior of cZccrb and the 
CCRB, which can be summarized as the following corollary. 

Corollary 1: Assume that there exist constants e\ s (0, 1) 
and e u > such that when s tends to infinity, the constants 
$i. s and i9 u s keep satisfying #i iS < e\ and i9 u s < e u for 
every s. Then if c e and c„ remains constant, the ratio 7ccrb 
possesses the following asymptotic behavior: 



A B 

— < 7CCRB < —j 

s s 



(34) 



where A and B are some positive constants. 

Remark 3: This corollary demonstrates that as s — > oo, the 
CCRB approaches er 2 tr ((A|As) _1 ), while the latter is just 
the MSE of the oracle pseudoinverse estimator. The estimator 
is the solution of the following minimization problem: 

argmin||y - A s x s \\j 2 , 

xs 

which merely minimizes the residual and totally ignores the 
fact that the noise is dependent on x. Therefore it can be 
concluded that as s increases, less information can be possibly 
obtained from the dependence of the noise term on x to help 
reduce the estimation error, and finally this dependence can 
be ignored. 

2 One may argue that mo^/||Ax||| seems more reasonable as an indi- 
cation of the measurement noise level. However, because of Assumption [T] 



IV. The Hammersley-Chapman-Robbins Bound 

The constrained CRB given in the above section possesses 
a simple closed form, but it takes into account only the 
unbiasedness in the neighborhood of a specific parameter value 
rather than the global unbiasedness for all sparse vectors. 
Therefore it can be anticipated that for estimators that are 
globally unbiased for sparse parameter values (i.e. for all 
estimators in U), the CCRB is not a very tight lower bound. 

In this section, the Hammersley-Chapman-Robbins bound 
(HCRB) is derived for sparse estimation under the setting of 
general perturbation. However, the calculation of the HCRB 
for general sensing matrix A is much more complicated, and 
therefore attention is focussed only on the simplest case of 
unit sensing matrix in this section 1271 . ll3~7l . Nevertheless, 
the HCRB of this simple case is still instructive for us to have 
a qualitative understanding of the HCRB for general cases. 

The HCRB in the context of sparse estimation with general 
perturbation can be summarized as the following lemma. 

Lemma 2: In the setting given in Section [II] consider a 
specific parameter value x € X s . Suppose {vj}J" =1 are k 
vectors such that x + Vj € X s for all i = 1, . . . , k. Then 
the covariance matrix of any unbiased estimator x £ U at x 
satisfies 

Cov(x) y vn^v T . (35) 



Here the matrix V is given by 



V = [vi, . . . ,v fc ], 



(36) 



and the (i,j)\h element of H is 



Hi 



rr 2 c 2 

•^X+Vi^X+Vj 



exp 
Av 



l|Av.,||, 2 2 



Av, 



2a* „. 



X+V;. 



where 



<T 2 



'X+Vj 



(37) 
(38) 



Proof: The proof is postponed to Appendix [D] □ 
It can be seen from Lemma [2] that the HCRB is actually a 
family of lower bounds of unbiased estimators. By employing 
different sets of Vj, one will generally get different HCRBs, 
and the tightest one is their supremum. However, it is often 
impossible to obtain a closed-form expression of the supreme 
value of the HCRB family, and thus our task is to employ a 
certain set of Vj in the hope that the corresponding HCRB 
will be simple and easy to analyze. 

By appropriately choosing a set of Vj and applying some 
special techniques, the following theorem of the HCRl£] will 
be obtained. 

Theorem 4: Assume that n > 2, and denote f3 = x 2 /cr 2 , 
where x q is the smallest entry in magnitude of the parameter 



one has 
instead. 



IIAxlj 



, and thus for simplicity m<r„/||x|||^ is employed 3 This lower bound is actually a limit of a family of HCRBs. Nevertheless 

this bound will still be referred to as the HCRB in the following text. 
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x, with q the corresponding index. Then the MSE of any 
estimator x G U satisfies 



msc(x) 



> at Is 



al + 2nai\\x\\l 



a 2 (n — s)/3exp(— j3) 
exp (3 — 1 



x 1 



1 



n - s + exp /?•(!- 



(39) 

for any x <G X s with ||x||f = s. The specific form of the 
function g((3) is 

/?(! - 2a 2 e (3) 2 



(exp/3- l)(l + 2na e 4 /?) : 

and <?(/?) satisfies 

0<g(/3)<l, V/3>0,n>2. 
When cr n and a e are fixed, ones has 
lim g((3) = 1. 

Proof: See Appendix IE] 



(40) 



(41) 



(42) 



□ 

The quantity (3 = x' q / cr 2 represents a special kind of signal- 
to-noise ratio, and can be named the "worst case entry SNR" 
ll27l . This quantity plays a central role in the transition from 
maximal support to nonmaximal support. However, it should 
be noticed that there exists an upper bound of the domain of 
P if tr e ^ 0: 



< 



(43) 



< 



Therefore the situation here is more complicated than that with 
only the measurement noise. One of the consequences is that 
when x g —> +oo, the HCRB does not generally tends to the 
CCRB of maximal support unless a e = 0, i.e. 

HCRB - CCRB 



lim 

q—t+oo 



'max supp 



CCRB 

n 



> 0, if a e 0. (44) 



'max supp 

This phenomenon reveals that the global unbiasedness has an 
essential effect on the problem of the a e ^ case. Fortunately, 
when a n and a e remains constant, the HCRB will converge 
to the CCRB of nonmaximal support as x q — > 0, as can be 
seen from 



lim HCRB 



2nat 



2nai 



(45) 



Therefore it can be said that the gap between the maximal and 
nonmaximal cases is eliminated. 

We compare the above result with other similar results 
appeared in literatures. In l27l . a closed-form expression of 
the HCRB for the a 2 = case is derived in a similar approach 
but with different choice of {v,}. Their closed-form result is 
tighter than ours when (3 is sufficiently large, but is not as tight 
as ours in the low f3 range and fails to close the gap between 
maximal and nonmaximal cases. In ||37ll , another lower bound 
is provided and is tighter than ours for all (3 > 0, but their 



derivation is based on the RKHS formulation of the Barankin 
bound which is difficult to be generalized to the a 2 ^ 
case. Despite the fact that our bound is not the tightest, it 
still provides a correct qualitative trend of the lower bound of 
sparse estimation, and is able to deal with matrix perturbation 
without much effort. 

A. Further Analysis of the HCRB 

It has already been mentioned that the domain of j3 pos- 
sesses an upper bound which is not greater than \j sa\, and 



thus the x 



-co limit of the HCRB does not coincide 



with the CCRB. This difference implicates the effect of the 
matrix perturbation on the HCRB and is the major topic of 
this subsection. For simplicity of analysis, in the following 
text it is assumed that as x q — s- +oo, all other components 
of x equal x q asymptotically, which leads to the fact that the 
upper bound of /3 is \jsa\. 



-co, the HCRB asymptotically equals 



2naf||x|| 



2nat\ 



tI(h - s)cxp(-l/5crg) 
scr2(exp(l/scr2) _ i) 



1 - 



n — s + exp(l/so^)(l — g(l/saf)) 1 



The above expression contains two terms, the first of which is 
just the CCRB of the maximal support case. For the second 
term, it can be seen that the er e 's mainly appear in the 
exponentials, which demonstrates that there exists a particular 
value of a R that separates the low a e region and the high a e 
region. This particular value will be named the transition value 
of <7 e and will be denoted by er ejt . 

The analysis and computation of er e t is simple. When a\ <C 
1/s, one has cxp(l/s(7g) 3> 1 and cxp(— I /sal) ~ 0' an d 
thus the HCRB approximates 

„ / 2nai\\x\\ 2 „ \ 



*\ al + 2nai\\x\\l) ' 

which is just the maximal support CCRB. However, if the 
matrix perturbation is large enough so that a\ 3> 1/s, the 
difference between the HCRB and CCRB cannot be neglected. 
Therefore l/\fs can be regarded as the transition point a e .t 
: when a e < a e ,t, the HCRB degenerates to the CCRB as 
x q —> +oo; when a e > cr e t , the lower bound is raised for 
large x q and does not degenerate to the CCRB. 

The theory of the transition point a e ,t may have a geo- 
metrical explanation which is not rigorous but very intuitive. 
The set X s can be regarded as the union of s-dimensional 
hyperplanes spanned by s coordinate axes respectively. The 
sparse parameter x lies on one of the hyperplanes E s and 
is surrounded by a noise ball whose radius is r 2 = er 2 . As 
x q — > +oo, the radius satisfies r 2 ~ cr 2 ||x|| 2 o ~ sa 2 x q . The 
transition point corresponds to the tangency of the noise ball 
to one of the hyperplanes of X s apart from E s , and the high 
and low a e regimes correspond to whether or not the noise 
ball intersects with another hyperplane (See Fig. |2). It can 
be easily verified that this geometrical interpretation gives out 
correct value of the transition point er ejt . 



The noise ball 



20 15 10 5 -5 -10 -15 -20 

c e (dB) 



Fig. 2. The geometrical interpretation of the transition point a e .t- The 
hyperplanes are part of the set X s , and the sparse parameter x lies on 



E s . The radius of the noise ball is r 2 



If this ball intersects 



another hyperplane, such situation belongs to the high a e regime; otherwise 
it belongs to the low a e regime. For the low a e regime, the HCRB equals the 
corresponding CCRB approximately if x q is sufficiently large; for the high 
<r e regime, the HCRB is evidently higher than the corresponding CCRB. 



V. Numerical Results 

In this section, numerical simulations are performed in order 
to substantiate the theoretical results presented in the previous 
sections. 

A. Numerical Analysis of 7CCRB 

Numerical experiments are first made on the CCRB, or 
equivalently, on the quantity 7ccrb- We wish to verify that 
the formula (l33l is valid and can demonstrate how 7ccrb 
varies versus the perturbation level c e and the noise level c n . 
Before we perform the numerical simulations, it is worthwhile 
to analyze the formula < f33T > first with the help of its graph 
obtained by numerical approaches. 

Define the function in the approximate formula d33l 

1 2c e 

7(c e ,c„) = -- 1—. (46) 

S 2c e + 1 + C„/C e 

The graphs of the approximate function j(c e , c n ) with varying 
c e and c n are shown in Fig. [3] It can be seen that when c n is 
fixed, the function 7(c e , c„) is monotonically increasing of c e , 
with limits 7(0 + , c„) = and 7(+oo,c„) = 1/s. It can also 
be seen that for each fixed c„, there exists a transition point c ej t 
of the curve which approximately separates the high c e regime 
and low c e regime: when c e <C c ej t, one has 7(c e ,c„) « 0, 
while for c e ^> c e . t , one has 7(c e ,c„) ~ 1/s. This transition 
point can be defined as the point such that 7(c e t , c„) = l/2s. 
For the special case c n = 0, the transition point is c e . t = 
1/2 = —3 dB; for general c„ > 0, c e ,t is the positive root of 
the quadratic function 2x 2 — x — c n = 0. 

When c e is fixed, the function 7(c e ,c„) is monotonically 
decreasing of c„. Fig.[3]also indicates that the transition point 
Ce,t(c n ) is a monotonically increasing function of c„, which is 
a direct corollary of the monotonicity of 7(c e , c n ) with respect 
to c„. 



Fig. 3. The graphs of the approximate formula of 7CCRB as a function of 
c e with different settings of c n . The theoretical approximate formula is given 
by (33). 



In order to verify that the general trend of 7ccrb can 
be described by the function j(c e ,c n ), numerical results of 
the quantity 7ccrb ar e simulated. The relation of the size 
of the sensing matrix A and the sparsity of x is set to be 
n = 20s, m = 10s, and s = 10. The perturbation level c e 
varies from 20 dB to —20 dB, and the noise level c n is set 
to be 0, —5 dB, and 15 dB respectively. The entries of A 
are drawn from i.i.d. standard normal distribution Af(0, 1/ra). 
Such generation of A is standard in the field of compressive 
sensing, and ensures the existence of the constants i3\ iS and 
tfu.s with overwhelming probability J35), 11361 . For the param- 
eter x, the support is equiprobably chosen from all subsets of 
{l,...,n} with size s, and the nonzero entries satisfy i.i.d. 
Bernoulli-type distribution with P(xk = —1) = P{xk = 1) = 
1/2, k <S supp(x). Each simulation runs 30 times so that 30 
different groups of A and x are tested. 

The results of these simulations are shown in Fig. |4] Those 
points which are marked by "x" are raw numerical simulation 
results, and the solid lines in these figures are graphs of 
the approximate function 7(c e ,c„). It can be seen from the 
simulation results that the curves of the function 7(c e , c„) can 
correctly describe how 7ccrb varies versus c e and c n . These 
figures also demonstrate that the solid curves lie almost in 
the middle of the raw data points, which partially justifies the 
validity of the approximate formula and the bounds given by 
(EH1> and (|29). 

Next we wish to verify the asymptotic behavior of 7ccrb, 
i.e. the s _1 law given by Corollary Q] This time the sparsity s 
varies from 3 to 300 with exponentially increasing increments, 
but n and m are still set to be n = 20s and m = 10s. 
The generations of A and x are the same as in previous 
simulations. Previous literatures have verified theoretically and 
experimentally that such setting of A can ensure the existence 
of the constants e\ and e u with high probability 11361 . For 
the perturbation and noise level, nine groups of values are 
employed. 

The results are illustrated in Fig. The dotted points are 
raw experimental data of 7ccrb, and the solid straight line 
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is the s _1 line calculated by the approximate formula ( |33| >. 
Notice that we have employed the log-log scaling of the 
coordinate system so that the s _1 law is represented by a 
straight line with slope —1 in the graph, which makes it 
convenient to justify the theoretical results. It can be seen from 
the graphs that the data points can be upper and lower bounded 
by two ,s _1 curves, and that they cluster near the straight line 
given by the approximate formula ( T33T >. Therefore it can be 
concluded that the trend of 7ccrb with respect to s can be 
described by the law. 

B. Numerical Analysis of the HCRB 

Theoretical result of the unit sensing matrix is given by 
Theorem |4] It suggests a continuous transition from the 
maximal support case to the nonmaximal support case and 
is fairly easy to compute. On the other hand, the demand 
of a closed-form expression compromises potential tightness. 
Generally more testing points x + Vj will lead to tighter lower 
bounds which, however, will not possess simple closed-forms 
and can only be evaluated by numerical methods. Therefore we 
will simulate the lower bound given by Theorem |4] as well as 
lower bounds computed numerically with more testing points. 

Before running the simulations, we first have a brief review 
of Theorem [4] It can be seen that the lower bound given by 
( |39| > consists of two parts: the maximal support CCRB and 
the additional term which connects the CCRB gap. Emphasis 
of numerical experiments will mainly be put on the additional 
term, and the common factor will be ignored. In other 
words, our analysis will be focussed on the following quantity: 

. (n-s)f3exp(-f3) f 1 \ 

exp/3-1 V n - s + exp 8(1 - g(P))- 1 J ' 

(47) 

Detailed settings of the numerical simulations are as fol- 
lows. For simplicity and convenience of analysis, the di- 
mensions are n = m = 10s. The parameter x is set to 
be [x q , x q , 0, . . . , 0] T . Different settings of a e and a n are 
employed, and for each group of {cr e ,cr n } we simulated the 
HCRB with varying x q to get a graph of the quantity dncRB- 

As mentioned in Subsection IIV-AI there exists a transition 
point er e .t which separates the high and low a e regime. The 
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Fig. 6. (^hcrbA" — s) versus <r e with different settings of s. 

first task is to verify the theory of transition point presented 
previously. The sparsities are set to be 1,3,10,30 and 100 
respectively, and for each setting of the sparsity, ranges 
from 0.0001 to 100. The measurement noise a n is set to be 
0.1, and the HCRB for x q = 1000 is computed as a good 
approximation for the x q — > +oo case. 

The numerical results are shown in Fig. |6l where ^hcrb 
is "normalized" by n — s so that the curves share a similar 
scale. It can be seen from the figure that the transition point 
for each s exists and can be well evaluated by <j^ t = 1/s. 
For er e <C cr e ,t, g^hcrb / (n — s) is much less than 1, and 
as <7 e —> 0, dncRB / { n — s) tends to zero rapidly; for a e > 
&e,t, <^hcrb / (i — s) cannot be neglected, and as tr e — > +oo, 
^hcrb / { n — s) possesses a limit which is of order 1. 

Next we fix er e and er„, and test how ^hcrb varies versus 
x q . The sparsity ,s is set to 1 for simplicity. Results are 
shown in Fig. [7] Fig. |7(a)| corresponds to the situation where 
a n = 0.1 is fixed with different er e 's. It can be seen from 
the figure that when a e belongs to the low value regime (i.e. 
<r e <r e ,t, where a &: t = 1 here), its effect on c?hcrb can be 
neglected. Moreover, each curve can be separated into three 
regions: the low x q region where HCRB w CCRB nonmax supp , 
the high x q region where HCRB rj CCRB max supp , and the 
transition region which connects the low and high x q region. 
However, when a e exceeds <7 e ,t, the behavior of c?hcrb will 
become rather different and exhibits a more complex pattern. 
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These phenomena demonstrate the way matrix perturbation 
influences the HCRB. 

Fig. |7(b)| corresponds to the situation where cr e = 1 is fixed 
with different a n 's. When a n = 0, it can be seen that c?hcrb 
remains constant because j3 = 1 ja\ in this cas^|. When a n ^ 
0, it can be seen from the figure that the curves are translations 
of each other. This can be explained by the expression of j3 that 
/3 = Xq/(&n + cr^x^), in which a multiple of a n is equivalent 
to a multiple of x q . When s > 2, the curves will generally not 
be translations of each other, and the behavior with varying a n 
will be more complex. Nevertheless, the limits dHCCRBU,^o 
and cJhccrb | x q — ¥oo will be the same for all a n . 

As mentioned before, more testing points of the HCRB 
will generally lead to a tighter lower bound, and thus we 
also compute the HCRB numerically with the following set 
of testing points in order to get a tighter bound as well as to 



4 When s > 2, however, dnCRB will generally not remain constant as x q 
varies. 



compare it with the theoretical result: 

{Vi} V su pp U Vnonsupp? 

V supp = {OMx.e, : i e S} U {-x % e, : ieS}, 

(4o) 

Vnomupp = {O.Ola-jej - x k e k : i f S,k £ 5} 
U {x^i - x k e k : i g S,k G S}, 

where S is the support of x. The settings of parameter x are 
the same as the previous experiment. The results are shown 
in Fig. [8] where the three typical cases {cr n = 0.1, a e = 0.1}, 
{a n = 0.1, a e = 1} and {a n = 0.1, a e = 3} are tested. 
It can be seen that when er e is below <7 ejt , the theoretical 
bound given by Theorem |4] is rather close to the numerical 
bound; the major different lies in the transition region of 
Xq, where the theoretical bound is slightly less than the 
numerical one. On the other hand, when er e is above a e ,t, 
the numerical bound is not only larger than the theoretical 
one but also exhibits a rather complicated behavior. These 
behaviors of the HCRB reveal the complexity of the sensing 
matrix perturbation situation. 
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Fig. 8. Theoretical and numerical values of ^hcrb- The measurement 
noise variance is cr n = 0.1. In the numerical case, c£hcrb is defined as 
(HCRB — CCRB max SU pp)/o"x- 



VI. Conclusion 

In this paper, the performance bound of sparse estimation 
with general perturbation has been studied. Two widely-used 
types of lower bounds, the CCRB and the HCRB, have been 
calculated and analyzed. For the CCRB, we have derived 
its closed-form expression and analyzed its behavior. It has 
been shown that the CCRB is more complicated if there 
exists sensing matrix perturbation, but the additional term 
can be bounded and relatively tends to zero if the sensing 
matrix satisfies RIP-type conditions. For the HCRB, only the 
special case with unit sensing matrix is studied due to the 
complexity of calculation, but the results are still instructive 



in that they are easy to analyze and can provide qualitative 
comprehension. It has been shown that the HCRB provides 
a tighter lower bound than the CCRB, and that it exhibits a 
satisfying transition behavior when the smallest entry of the 
parameter tends to zero. Numerical simulations have also been 
performed to substantiate these theoretical results as well as 
to give a more quantitative understanding of the theorems and 
formulas. 

There are several future directions to be explored. First, 
the HCRB is obtained only for the unit sensing matrix case, 
which is useful for a qualitative comprehension but has ap- 
parent limitation. In many cases the sensing matrix cannot be 
assumed unit, and a more precise quantitative study on the 
performance bound also requires generalizing the HCRB to 
the general sensing matrix case. There may be two practical 
approaches to this problem. One is to derive a closed-form 
lower bound which is convenient to analyze and understand; 
the other is to find a tractable way to numerically compute the 
lower bound. However, both ways need further research and 
are still waiting for useful results. 

Second, the HCRB provides a lower bound for globally 
unbiased estimators. However, recovery algorithms are quite 
likely to be biased in the sparse setting, and the bias is usually 
dependent on the noise variance. Moreover, there are occasions 
that biased estimators can achieve a lower MSE than unbiased 
estimators [38). These problems also point out a possible 
direction for future study. 

Appendix A 
Proof of LemmaQ] 

We first compute the likelihood function p(y; x). Notice that 
the problem (01 can be re-formulated as 

y = Ax + Ex + n, 

= Ax + n x , (49) 

where n x = Ex + n denotes the equivalent noise. Apparently 
(fl~|l and ( |49l share a similar formulation, which hints a close 
relation between the two conditions of noisy observation and 
general perturbation. 

Let us first find the probability distribution of Ex. Because 
the elements of E are drawn i.i.d. from Gaussian distribution, 
it is well-known from elementary probability theory that Ex is 
an m-dimensional Gaussian-distributed random vector. It can 
also be seen that Ex is independent of n. 

Denoting the (i,j)xh element of E as and fcth element 
of x as Xk, the ith component of Ex can be formulated as 



(Ex), = ^2 eikXk, 
fe=i 

from which it is obviously seen that _E[Ex] = 
covariance of (Ex)j and (Ex)j is given by 

Cov((Ex),, (Ex),) = £[(Ex) ?; (Ex),] 

n n 

=E[ ^ e. lk ejix k xi] = ^ x k xiE[e lk e-ji] 



(50) 



0. The 



k,l=l 



(51) 



A;, 2 = 1 



x k xia 2 e 5ij5 k i = a\ 



i j ■ 
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where Sa is the Kronecker delta. Thus the covariance matrix 



of Ex is 



Cov(Ex) = fx 2 ||x|| 2 U„ 



(52) 



The above result shows that the additive noises Ex can be 
viewed as i.i.d. Gaussian noises with variance cr^ 1 1 x l I f ■ 

Given the probability distribution of Ex, it can be derived 
from the mutual independence of Ex and n that the total noise 
term n x = Ex + n satisfies the distribution 7V(0, (a 2 ||x||| + 
ofjUm). Therefore the likelihood function is given by 



L(x) = 



1 



(2™*)- 



■ exp 



(y-Ax) T (y-Ax) 



2ol 



where 



o 2 \W\\ 2 



(53) 



(54) 



Next we compute the FIM with the help of the likelihood 
function. The gradient of the log likelihood function lnp(y; x) 
is given by 



V x lnp(y;x) 



1 



|y- Ax ll 



A T (y-Ax) 



(55) 



and therefore 

(V x lnp(y;x))(Vj lnp(y;x)) 

2 4 

m °l t 



2mo\ ||y - Ax|| 



(ct 2 x t x + ct 2 ) 2 
1 



-2 A T (y - Ax)(y - Ax) T A 



Axil 



xx T + terms linear in (y — Ax) 



+ terms cubic in (y — Ax) . 

(56) 

The specific forms of the linear and cubic terms in the above 
equation is of no importance; they will vanish when we take 
their expectation because y — Ax is Gaussian-distributed with 
zero mean and diagonal covariance. The expectation value of 
||y — Ax|| 2 is to((t 2 x t x + a 2 ), while for ||y — Ax||l the 
expectation will be (m 2 + 2m)(er 2 x T x + cr 2 ) 2 , which can be 
seen from the fact that for a series of i.i.d. zero-mean Gaussian 
random variables wi, . . . ,Wk with the same variance q 2 , one 
has 



k 

E 

i=l 



k k 

; E E 

i=l j=i+l 



2 2 

w l w j 



= E^ 

= k ■ 3q A 



k k 

^EE 



E[w 2 w 2 



M^i) ((?2) 2 = {k 2 + 2k)q\ 



(57) 

where the fact from elementary probability theory that 
E[wf] = 3q 2 is used. By taking the expectation of (l56l l. and 



recalling the covariance matrix of y — Ax = Ex + n, it can 
be shown that 



J(x) 



2m 2 a 



2^4 



(a2xT x + (T 2)2- 



1 A T A-' +2m ) CT e _T 



CT 2 X T X + al (o- 2 X T X + tT^) 2 



-A*A 



2mat 



CT 2 x T X + (72 (o-2 X T X + 0-2)2 

which is exactly (TBI ). 

Appendix B 
Proof of Proposition[2] 

Given the support S, the subspace T is given by 

J r = span({e Sl ,...,e s J), 

therefore the unbiasedness in the neighborhood of x € {u : 
supp(u) = S} only requires that the bias function 6(x) = 
-E[x p inv — x] is zero on {u : supp(u) = S}. It can be seen 
that for any x whose support is S, one has 

(£[x pinv (y)]) s = E[A f s y} = £[A|,(Asx s + Ex + n)] 
= A^A s x s + A* £[E]x + A+ E[n] 
= xs, 



(58) 



and 



(i?[x pinv (y)]) sc = £[(x pinv (y)) S c] = 0. (59) 
Thus one has 

£[x p i„v(y)] = x, Vx G {u : supp(u) = S}. (60) 

The MSE of the oracle pseudoinverse estimator x P i nv for 
x e {u : supp(u) = S} equals that of the oracle estimator, 
which could be found, for instance, in Q, ll20l . 



Appendix C 
Proof of TheoremO 

Assumption Q] implies that the eigenvalues of A^As, de- 
noted by Xi, are bounded as follows 

1 - 01,. < A, < 1 + u>a , i = l,...,s. (61) 

Therefore the eigenvalues of (A^As) -1 , denoted by Aj, are 
bounded as follows 



1 - 1 

< A,; < 



i + K, s ~ l - As 

One can further derive from the above inequalities that 



(62) 



<\\(A-A s y^ s \\ 2 2 <-^ 



(l+^u, S ) 5 



1,3 
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Substituting these inequalities into (|27] i, one will obtain 

2 1 + $u,s 



dcCKB < cr 



2mal 



1 + 2mcr 2 + ^ 

1 - *.„ 



C„(l+l?u,a) ' 



(63) 



(l + l?u, S ) 2 



2m<j 2 



1 + 2too\ 2 - #i s + 



c„(l-i?i, 3 ) 



Next we wish to bound the term mc 2 . It can be seen from 
the definition of the perturbation level c e that 



2 tr(AgAs) 
ma„ = c e - -. 



(64) 



With the help of ( IBH , the bounds of mcr 2 can be readily given 
by 



C e (l - 01,,) < ma 2 e < C e (l + u ,a). 



(65) 



Substituting this into (1631 . one will obtain the bounds given 
by (EHJ and @9). 

The bounds of 7ccrb can also be derived by similar 
techniques. One will get the bounds of (l30T > and OTb by only 
substituting the following inequalities into (f28b and 



1 + K, 



< tiHAlAs)- 1 ) < 



(66) 



Appendix D 
Proof of Lemma|2] 

We first refer to l23l for the definition of the multivariate 
HCRB. 

Proposition 3: 11231 Suppose p(y; x) are a class of pdf 's 
parameterized by x 6 X, and let x, x + Vi , . . . , x + be test 
points contained in the constrained parameter set X. Define 



5im x = m x+Vi - m x , i = 1, . 
<5m x = [<5im x , . . . ,<5 fe m x ] 

hp = P(y; x + v,) - p(y; x), 

Sp = [Stf, . . .,5 k p] T . 



■ k 



(67) 
(68) 
(69) 
(70) 



where m x 6 M" is a vector- valued function of x. Then for any 
estimator x with mean m x , i.e. E y . tX [x] = m x , the estimator 
covariance matrix Cov(x) satisfies the matrix inequality 



Cov(x) y Sm x < E y ._ x 



Sp Sp' 1 
p p 



(71) 



where p denotes p(y; x) for short. 

Next Proposition [3] will be applied to the sparse estimation 
problem with general perturbation. For unbiased estimators, it 
is easily seen that m x = x, and thus the matrix Sm x equals 
V = [vi, . . . , Vk]. For the matrix 



H = R. 



y;x 



Sp Sp 



Tl 



p p 



(72) 



its (i,j)th element is given by 
~5ip S k p 



R 



y:x 



= E, 



y:x 



P P 

p(y;x + Vj) 



- 1 



p(y;x + vj) 



p(y;x) / V p(y;x) 
P(y;x + Vi)p(y;x + Vj)" 



- 1 



p 2 (y;x 

p(y;x + Vj 

p(y;x) 



^y;x 



p(y;x 



p(y;x) 



f l. 

(73) 



The second term of the above equation is 
>(y;x + Vj) 



Ry-x 



p(y;x+v. t )dy = -1, (74) 

p(y;x) 

and similarly the third term also equals —1. The first term is 

>(y;x + Vi)p(y;x + Vj-)" 



E. 



y:x 



p 2 (y;x) 
p(y;x + v. t )p(y;x + 

p(y;x) 



27rcr x+v,°x+v, 



cxp 



dy 



T A T 



v/A 



T A T 



v A 



exp 



X > X. V,; ,Vj 



x+v,: x+v 7 - 



X+Vi ^X+Vj 

lly- Axil 2 



2c 2 



(y - Ax) 
dy 



exp 



where 



Av, 



|Av,|| 2 



Av, 



'X+Vi 



(75) 
(76) 



Substituting these results into (l73l . one will readily obtain 
(l37l i. and Lemma [2] is proved. 

Appendix E 
Proof of TheoremO 

For simplicity, it is assumed without loss of generality that 
the support of x is S = {1, 2, . . . , s}. We split the MSE of an 
estimator x into two parts as follows: 

mse(x) = E i(xi ~ Xif\ + X! E ^ il ~ x ^ (77) 

i.e. the support part and the non-support part. We choose 
different sets of Vj for the two parts respectively, and in the end 
combine these two parts to get a lower bound. This approach 
results from the fact that the MSE of a vector-valued estimator 
is the sum of the MSE of its components. 

For the lower bound of the support part, the following 
{vi}| =1 is employed: 

v, = tej, i = l,...,s, (78) 
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where t is an arbitrary real number. After the corresponding 
covariance matrix is obtained, we take the limit ! 4 and 
afterwards take the sum of only the first s diagonal elements 
to obtain the lower bound of the support part. It can be proved 
(see Appendix [0 that this lower bound is identical to the 
CCRB. 

For the non-support part, a different set of testing points are 
used. As mentioned before, when one of the nonzero elements 
is small, the HCRB should be larger than the CCRB, and 
thus the nonzero components of small magnitude should be 
taken into consideration when constructing the set of Vj. The 
following is used: 



te qi 



i = 1, . , . ,n — s, 
i = n — s + 1. 



(79) 



Here t is also an arbitrary real number which will tend to zero 
after the covariance matrix is obtained. It is obvious that such 
{v.i} will not violate the sparsity of the testing points {x+Vi}. 
After the limit t — ^ is taken, the last n — s diagonal elements 
of the covariance matrix will be summed to represent the lower 
bound of the non-support part. 

The matrix V in d35l l could be expressed as 



V 



t Xq Xq Xq 

Xn 







-X q l T 
XqXJ n — s 



(80) 



(we have slightly changed the order of Vj's which has no effect 
on the final result), where the bold face 1 denotes the column 
vector [1, 1, . . . , 1] T . Next we have to compute the elements 
of the matrix H, which are given as follows: 



X^X.t/ 



<q , X °q 

where it has been defined that 



exp(^( 2 %- 



- 1, 



(7 



(81) 



^ = ^ + ^(l|x + *eJl), 



(82) 



and 



^=^=^(-^+4(4-1 



at 



Hi j = exp 



t = 2, 
(1 + %)^ 



, n 



1. 



-1, 
+ 1, 



(83) 



(84) 



i,j = 2,...,n — s+ 1. 
The matrix H could be represented as 



H 



a{t) b(t)l T ' 
b(t)l D 



(85) 



where a(t) = H n , b(t) = H 12 and D = (ff 22 - # 23 )U„_ S + 
i?23ll T - In order to check the existence and the expression 



of H \ we first calculate the inverse of D by the Sherman- 
Morrison formula 11331 : 



D 



H 



22 



H. 



2.'! 



u„ 



H 



22 



#23 (™ 



li 



-11 



(86) 

Then the blockwise inversion formula is used for the calcula- 
tion of H _1 (if it exists): 



H 



7n(*) 



L n L r>- 



' J a(t)-b 2 (t)l T D- 1 l 



(87) 



where fu(t) and /i 2 (t) are some functions of t. Because at 
last only the last n — s diagonal elements will be summed up, 
the specific form of the two functions are of no importance. 
The existence of H _1 relies on whether the submatrices are 



valid. By employing 



one has 



D 1 = 



Ho 



H 



1 T D 1 1 = 



23 (n - 
n — s 



1) 



(88) 



H 



22 



H 2 3(n 



1) 



With these equations, it can be shown by tedious calculation 
that 

6 a (t)D- 1 ll T D- 1 



lim ■ . 
t->o a(t) 



/3(1 



-b 2 (t)l T T>- 



H 22 + H 23 (n - s - 1) 

x {[H 22 + H 23 {n - s - 1)](1 + 2na 4 J) 

.2, 



(89) 



(n - s)(3(l - 2atPY 



and that 



H 22 = lim D 

(n — s) cxp(— j3) 



exp 13 — 1 



^(QD^lTTp- 1 
o(t)-6 2 (t)l T D- 1 l 
1 

1 



exp/?(l- 5 (/3))- 



(90) 



where we have defined H 22 as the limit of the (2, 2)th 
submatrix of H, and the function g{0) is defined as (l40b . As 
will be shown later, under the modest requirement that n > 2, 
for any /3 > one has < g(/3) < 1, and thus the expressions 
presented above are all valid. In this way we have not only 
checked the invertibility of H, but also find out the expression 
of the inversion's limit. 

To get the final form of the HCRB, we calculate VH _1 V T 
and sum up its last n—s diagonal elements. By straightfor- 
ward calculation it can be verified that this is just equal to 
x^tr(H 22 ). Adding this to the lower bound of the support 
part, given by 



E[(xi - a;,) 2 ] > CCRB 



max sup 



2maj 



(91) 



2mot\ 



one will finally get the expression d39l . 

The last part of this section deals with the properties of g{0). 
The proof of the two limits given by d42l is straightforward. 
In order to prove PTt . we consider separately the situations 
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< P < l/2o-2 and p > l/2cr 2 . When < P < l/2a 2 , one 
has 

(1 - lalPf < 1< 1 + 2naip, 

and together with P < exp j3— 1, it leads to the inequality that 
Sr(y8) < 1. When /3 > l/2a% it follows that 

P (1 - 2a 2 /3) 2 < /3 4atP 2 



expp-1 1 + 2ncrf^ exp p - 1 2ncrf /3 

- 2 /3 2 
n exp 7 — 1 

Because /3 2 /(exp7 — 1) < 1 for all p > 0, it can be seen that 
g(P) < 1 when n > 2. Combining the discussions of the two 
situations, we have proved that g(P) < 1 for all P > 0. The 
inequality that g(p) > for all P > is trivial. 

Appendix F 
Lower Bound of the Support Part 

In this section, we prove that the lower bound of the support 
part obtained by using Vj = tej, i = 1, . . . , s and taking t 
is just the CCRB of maximal support. It can be easily seen 
that the matrix Sm„ in d68|i is 



5m x = t 
therefore the right side of (l7TT i is 
<$m x H<5mI 







t 2 Ht 0' 




(92) 



(93) 



where the matrix H is given by j72l . Next we calculate the 
limit of H/t 2 . It can be seen that 



1 

J 2 



H = E 



dp dp 



Tl 



pt pt 



and for Sp/pt, one has 

f8p\ = 1 p(y;x + te z ) -p(y;x) 
\ptJi p(y;x) t 

Taking the limit i — > 0, one has 



(94) 



(95) 



lim ['^ = 



1 



lim 



p(y;x + te,) -p(y;x) 



t^o\ptJ i p(y;x)t^o t 

1 5p(y;x) 91np(y;x) 



(96) 



9e, : 



p(y; x) de, 

where d/dei denotes the directional derivative along with 
respect to x. Thus the limit of Sp/pt is 

Sp 



lim 

t-yo pt 

and the limit of H/t 2 is 



[U s 0] V x lnp(y;x), 



(97) 



lim = E 

i->-0 t 2 



lim 



dp 1 



t^o pt pt 

U s 0] E [(V x lnp(y; x))(Vj lnp(y; x))] 



U., 




(98) 



[U s 0]J 







Here the matrix J is just the Fisher information matrix, and 
thus it can be verified that the above matrix is invertible in the 
setting given by Section HI] Substituting it into d93l . and take 
the sum of the first s diagonal elements, one will get 



limtrf^H" 1 ) 

t->0 




0] J 



U, 




(99) 



Comparing this with the proof of Theorem Q] one will readily 
accept that the bound obtained is just the CCRB for maximal 
support case. 
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